Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries

We develop a method, SBayesRC, that integrates genome-wide association study (GWAS) summary statistics with functional genomic annotations to improve polygenic prediction of complex traits. Our method is scalable to whole-genome variant analysis and refines signals from functional annotations by allowing them to affect both causal variant probability and causal effect distribution. We analyze 50 complex traits and diseases using ∼7 million common single-nucleotide polymorphisms (SNPs) and 96 annotations. SBayesRC improves prediction accuracy by 14% in European ancestry and up to 34% in cross-ancestry prediction compared to the baseline method SBayesR, which does not use annotations, and outperforms other methods, including LDpred2, LDpred-funct, MegaPRS, PolyPred-S and PRS-CSx. Investigation of factors affecting prediction accuracy identifies a significant interaction between SNP density and annotation information, suggesting whole-genome sequence variants with annotations may further improve prediction. Functional partitioning analysis highlights a major contribution of evolutionary constrained regions to prediction accuracy and the largest per-SNP contribution from nonsynonymous SNPs.


Table of Contents
To simultaneously analyse all common variants, SBayesRC utilises a low-rank model based on the eigen-decomposition on quasi-independent LD blocks in the human genome 1 .In each LD block k, the summary-data-based model (as used in SBayesR) is transformed into a new model where the joint effects of mk SNPs are fitted to qk linear combinations of marginal SNP effects (instead of mk marginal SNP effects), with qk being the number of top principal components (PCs) that collectively explain at least a given proportion () of variance in the LD matrix (Extended Data Fig. 1a).In this model, the dimension of the system of equations is qk  mk, with qk much smaller than mk, which reduces memory consumption and speeds up computation significantly.For instance, when  = 99.5%,qk/mk ≈ 0.2 on average across LD blocks for the 7.4 million common SNPs used in this study (Supplementary Fig. 1).
Essentially, the low-rank model refines the signals in GWAS summary statistics by collapsing information from SNPs in high LD.In addition to the computational advantages, the low-rank transformation enhances model robustness by 1) removing small PCs that are subject to LD differences between GWAS and reference samples, and 2) enabling a Gibbs sampling algorithm to directly estimate the residual variance, a nuisance parameter accommodating remaining LD variations and potential errors in GWAS summary data.
We derive a summary-data-based low-rank model from a general form of individual-level linear regression.Consider model where y is the vector of trait phenotypes adjusted for covariates, such as sex, age and principal components (PCs), X is the genotype matrix of m SNPs standardised to have column mean zero and variance one,  is the vector of true SNP effects, and  are the residuals with () =   2 .Let N be the sample size.Multiplying both sides of the equation by .It is often neither feasible nor necessary to compute the wholegenome LD matrix.Alternatively, we compute R for each of the LD blocks that are found to be approximately independent in the human population.In this case, the genome-wide LD matrix is a block-diagonal matrix with blocks defined by LD blocks.The eigendecomposition of Ri for block i, which can be performed independently and in parallel between block, is (the subscript is ignored for simplicity in notation)

𝐑 = 𝐔𝚲𝐔′
where U is the matrix of eigenvectors and  is the diagonal matrix of eigenvalues.
Substitution of R in the equation above gives Due to LD between SNPs and limited sample size, the LD matrix estimated from a reference sample is often rank deficient.In this case, a number of eigenvalues are zero.Additionally, small eigenvalues are subject to sampling variation in LD between GWAS and LD reference samples.To this end, we partition  into

𝚲 = [ 𝚲 𝑞 𝟎 𝟎 𝚲 0 ]
where   contains q eigenvalues in descending order that cumulatively explain at least a given proportion () of variance in LD, i.e.,  = where Λ  is the i th nonzero eigenvalue, and  0 contains remaining eigenvalues including zeros.Then the model can be written as To remove the noise in LD, we discard the equations for  0 , resulting in a low-rank model: where   has a dimension of  ×  with  ≪ .In essence, the true SNP effects are fitted to a smaller number of effective data points rather than the observed data points that are highly correlated to each other.This model is general and can be applied with different assumptions on the distribution of SNP effects .

Advantages of using low-rank model
We proposed a rank reduction approach to account for correlations between marginal effects, allowing the joint effects of genome-wide SNPs to be fitted to a set of observables with independent residuals.There are several advantages of using this low-rank model.First, it substantially improved the model robustness.We showed that our method is robust to LD differences between GWAS and reference samples, as well as per-SNP sample size variations resulted from the meta-analysis between cohorts with different genotyping platforms.As shown in the real trait analysis (Fig. 3b, Fig. 4c,d, and Extended Data Fig. 7), SBayesRC consistently yielded robust prediction accuracy, when using an external LD reference independent of the GWAS sample, outperforming other methods.We recommend using insample LD for improved prediction accuracy, but when this is not possible, SBayesRC is expected to exhibit greater robustness compared to other methods.The low ranking is not only due to the elimination of many small eigenvalues/eigenvectors in each LD block, which are subject to high sampling variation in LD, but also because the independence of transformed summary statistics enables the sampling of block-wise residual variance, which introduces a mechanism to manage the convergence issue caused by violations of model assumptions (Supplementary Note 3).It has been found that SNP effect sizes can blow up during MCMC when the model fails to converge.In this case, the sampled values of residual variances would be large, causing the SNP effect sizes to shrink back toward zero, thereby preventing convergence failure.Second, the low-rank model substantially improved the computational efficiency, which allows us to fit a large number of SNPs with only a small fraction of computation resource compared to the original model.In theory, SBayesRC is scalable to fit variants at the sequence level via calibrating the required proportion of variance in LD explained by the selected eigenvalues/eigenvectors, making it a powerful tool to analyse the incoming large-scale whole-genome sequence data.Third, this low-rank model depicts a general framework that can be applied with different priors, as used in other methods.

Estimation of residual variance helps to improve model robustness
In the summary-data-based model, Eq (1), () = exist large LD differences between GWAS and LD reference samples.This is because using summary statistics from GWAS and inaccurate LD data from a reference is analogous to using estimated genotype data with noise in the fitted model for GWAS: where  ̂ is the combination of genotypes used in GWAS (X) and the differences to those observed from the reference (), and  ̂ is the ordinary least squares estimate for the SNP effect.It can be seen that the new residual in the above model can have variance larger than the phenotypic variance when the noise in the genotype data is large, i.e., (Δ ̂+ e) > Var().Thus, it would be beneficial to estimate the residual variance from the data given the GWAS summary statistics and reference LD data.
In contrast to Eq (1), it is very straightforward to estimate the residual variance in the lowrank model, Eq (2), because the residuals are independently distributed, () = with R being a mm matrix and  being mutually correlated with a variance-covariance matrix proportional to R. In each LD block, we perform eigen-decomposition on R such that where U and  are matrices of eigenvectors and eigenvalues, respectively.The low-rank model is constructed such that the dimension of observed data (b and R) is reduced but the dimension of parameters () remains.In essence, it fits q observables with m SNP joint effects, where q<<m and the observables are correlated only through SNP joint effects but not residuals: with  being a q1 vector and  being a qm matrix.As shown in our simulation and real data analyses, the low-rank model approximates the standard model with high accuracy, when the choice of q is well calibrated, and has considerable improvement in computational efficiency and robustness.Note that an alternative reduction is to reduce the dimension of parameter space, that is, from Eq (3) we can have where U is a mq matrix and  * = ′ becomes a q1 vector that captures the principal component effects.In this case, the Bayesian machinery to learn about the different weights associated with functional annotations would not work because each element of  * is not the SNP joint effect itself but a linear combination of SNP joint effects.
In summary, the Bayesian learning about functional annotations works in the low-rank model because the low rank approximation still captures >99% of the variance.However, it is expected that with a lower eigenvalue threshold (e.g., 50%), the performance would be compromised.

Alternative parameterization for 𝝅
To remove the dependence between elements of   for each SNP, we employed an alternative parameterization for modelling membership probabilities and annotation effects.
Let   be the indicator for the mixture component membership for SNP j: =  with probability   ;  = 1 to 5 We define a conditional probability that the SNP effect belongs to the k th distribution given that it has passed the bar for the (k-1) th distribution as We then apply the generalised linear model to link   with   , i.e., In this parameterisation, all   are independent, which means that   can be sampled in parallel in each MCMC iteration, and   can be sampled from its full conditional distribution using Gibbs sampling algorithm, following the algorithm of Albert and Chib 2 .
Let   be the indicator variable for whether the SNP effect can "climb" up to a higher distribution, i.e.,

𝑧 𝑗𝑘 ~ 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝 𝑗𝑘 )
To allow Gibbs sampling, a probit link function is chosen, namely where Φ(•) is the cumulative density function (CDF) of the standard normal distribution.It has been shown that with an auxiliary variable   defined as a linear model can be constructed as with   ~ (0,1).In this model, given a normal prior distribution,   ~ (0,    2 ), the full conditional distribution for   is a univariate normal distribution, since   is conditionally independent of   given   .Given   and   , the full conditional distribution for the latent variable   is a truncated normal distribution.The Gibbs sampling procedure is described in the following section.

Scaling the SNP marginal effect estimates
The derivation for the summary-data-based model is based on the marginal effects in units of per standardized genotype (b).When the marginal effects were estimated from GWAS using genotypes at 0/1/2 scale (b * ), b can be estimated, in a scalar form, by where   2 is the phenotypic variance,   is the per-SNP sample size, and   is the standard error for SNP j.If the trait phenotypes are not standardized, the phenotypic variance can be estimated by taking the median value of across SNPs, where   is the allele frequency in the GWAS sample (ref 3,4 ).The per-SNP sample size   can be replaced by the overall sample size N. Here, we assume   2 = 1, then and will scale the joint effect estimate   back to the phenotypic scale using the same   so that this parsimonious assumption would not have an impact on the result (ref. 5,6).

Estimation of SNP-based heritability and per-SNP heritability enrichment for each annotation
The total genetic variance is We calculate this quantity in each of MCMC iterations given the sampled values of SNP effects .Assuming unit phenotypic variance, the SNP-based heritability ℎ SNP 2 =   2 , estimated by the posterior mean of MCMC samples discarding the samples from the burn-in period.
For a binary annotation c, the total variance explained by the SNPs within the annotation is calculated as where   is the number of SNPs within the annotation.

MCMC sampling scheme
We use MCMC sampling to draw posterior inference on the model parameters.The joint distribution of data and all parameters in the low-rank model is Suppose   is the indicator variable to the distribution membership of   .The full conditional distribution for   is where The full conditional distribution for   is where (|  = , ,   2 ,   2 ) is shown as above and (  = ) =   .
As described above,   is a function of   , and   is a linear model of   through the probit link, i.e., Here, we introduce another indicator variable   , where and a latent variable   , for which the full conditional distribution is Similar to the sampling of SNP effects, we use single-site Gibbs sampler to sample the annotation effects.The full conditional distribution for   is where The full conditional distribution for    2 is where  ̃ =  +   and ̃ 2 = (  ′   +     2 )/ ̃.

Calibrating parameters for the low-rank model
We used simulation to calibrate our low-rank model which requires a specification of two parameters: the minimum width (cM) of a LD block after merging contiguous small quasiindependent LD blocks 1 , and the minimum proportion () of variance in the LD matrix explained by the selected PCs in each LD block.We simulated GWAS data using 1,154,522 SNPs on 328,501 individuals of European ancestry in the UK Biobank 7 (UKB) and calculated the prediction accuracy in a hold-out sample of 14,000 individuals.The results showed that the prediction accuracy increased slightly with the LD block width, reaching to a plateau at 4cM (Supplementary Fig. 2).Consequently, we used a minimum block width of 4cM in the subsequent analysis.The other parameter  had a greater impact on the prediction accuracy, for which the optimal value would, in principle, depend on the data used for GWAS and LD reference.To determine the best  value automatically, we employed a pseudo-validation approach based on the observed GWAS summary statistics and the result of eigen-decomposition on the LD matrix as mentioned below (Supplementary Note 10).

Tuning for the optimal eigenvalue cut-off
Our low-rank model requires a cut-off value for the minimal proportion () of variance explained by the eigenvalues in each LD block.The optimal choice of  was determined by a summary-data-based validation approach, following the strategy in Zhang et al 8 .First, we constructed a set of summary statistics for training and a set for validation, conditional on the observed GWAS summary statistics and the LD reference data.Second, we carried out multiple short runs (150 iterations) of SBayesRC analysis (without annotations) to estimate the SNP joint effects using the training summary statistics and different  values ( = [0.9,0.95, 0.99, 0.995]′ by default).Third, we computed the prediction accuracy using the validation summary statistics.Finally, we used the  that gave the highest prediction accuracy in the formal analysis.In contrast to Zhang et al, where the strategy of summarydata-based validation was used to tune the model parameters (e.g., ), we only used this strategy to find the optimal  value and still estimate all model parameters from the data.In addition, in Zhang et al, individual-level genotype data from the reference are required, whereas our approach, as described below, only requires the LD correlation matrix without a need to access to the genotype data.To obtain samples from this multivariate normal distribution efficiently, we make use of the eigen-decomposition results for each LD block.That is,

𝑅 = 𝑈Λ𝑈′
where  and Λ are matrices of eigenvectors and eigenvalues with the maximum cut-off of variance explain ( = 0.995 by default).Let  is a vector of standard normal random numbers with the size equal to the number of kept eigenvalues, i.e.,   ~(0, 1) Then, The validation data set therefore is The marginal standard error for   and   is simply √1/  and √1/  , respectively.
Calculation of prediction accuracy based on summary statistics.We compute the prediction accuracy as the correlation of phenotypes and PRS in the validation sample, i.e.,
We calculate the pseudo-validation R in each chain.If R>0 and |R / R from 0.995| is larger than 1.25, SBayesRC will switch the  to the new value from the default of 0.995.If the largest R is from  = 0.9, then we will prompt the user to further expand the grid to 0.8, 0.7, 0.6 or check the QC of the summary data.

Violation of model assumptions
There are at least two important assumptions implied in the general form of summary-databased models 6 .One assumption is that the LD correlation matrix calculated from the reference sample is consistent with that from the GWAS sample, which is violated when the LD reference has a too small sample size (i.e., large sampling variation in LD) or is genetically different from the GWAS sample.Another assumption is that the summary statistics are derived from the same set of individuals for all SNPs, which may not hold when the summary statistics are obtained from a meta-analyses where different SNP genotyping panels, imputation references or quality control (QC) procedures are used in different cohorts.Failure to satisfy these assumptions can result in severe model misspecifications.In SBayesRC (or SBayesRC without annotations), we aim to account for the heterogeneity in both LD and per-SNP sample size by removing those principal components with the smallest eigenvalues in the LD matrix and estimating the residual variance from the data (Methods).
We performed genome-wide simulations based on the imputed SNP data in the UKB to assess the robustness of our method to model misspecifications, in comparison of state-ofthe-art methods including LDpred2 5 and SBayesR.

Sensitivity analysis in simulation for SBayesRC
For sensitivity analysis, we conducted additional runs of SBayesRC using different mixture distribution scaling factors or the number of mixture components, which deviated from the true values used in the simulation.We observed only negligible changes in the prediction accuracy, indicating that SBayesRC is highly robust to these variations in model specifications (Supplementary Fig. 9 and 10).Moreover, to assess the performance of SBayesRC under an alternative data-generative model, we simulated data where the SNP effect variance was modelled as a weighted sum of components based on functional annotations, akin to a model of S-LDSC 11 or MegaPRS 8 .Even in this scenario, SBayesRC demonstrated superior performance (Supplementary Note 13 and Supplementary Fig. 10).

Simulation under the S-LDSC/MegaPRS model
We have performed additional simulations where the variance of effect sizes is additively linked to the functional annotations (similar to the LDAK-BayesR-SS or S-LDSC model).
The data generative model is: with   being the observed annotation per SNP and   2 being the annotation heritability enrichment parameter from height, and  1 ,  2 ,  3 ,  4 are set to be 0.998, 1.8910 -3 , 9.2610 -6 , and 1.9710 -9 , respectively.The trait heritability was set to be 0.1 or 0.5.The pattern of prediction accuracy from using 1M or 7M SNPs and with or without annotations in SBayesRC is similar between the two data generative models (SBayesRC model or S-LDSC model for simulation).In both cases, SBayesRC with 7M SNPs and annotations gave the highest prediction accuracy.We noted that overall the prediction accuracy using the S-LDSC model in the simulation was higher than that using the SBayesRC model in the simulation.This may be because, under the S-LDSC model, some causal variants can have a large causal effect, leading to a large discovery power from GWAS, and therefore higher prediction accuracy.In this case, the added value from the annotations became trivial.

Difference between the LDSC/MegaPRS model and SBayesRC model
S-LDSC 12 assumes that each SNP effect has a univariate normal distribution and is not from a mixture model.In this case, it is reasonable to model the SNP effect variance as a weighted sum of components with respect to functional annotations, i.e.,   ~ (0,   2 ), with where   is the value of annotation c at SNP j and   2 is the heritability enrichment in annotation c.MegaPRS 10 is a tool comprising multiple summary-data-based prediction methods, including an infinitesimal model (LDAK-Ridge-SS), a two-normal mixture model (LDAK-Bolt-SS), and a multi-component mixture model (LDAK-BayesR-SS).LDAK-Ridge-SS can be regarded as an equivalent model as S-LDSC.However, in LDAK-BayesR-SS,   ~  1  0 +  2 (0,   2 /100) +  3 (0,   2 /10) +  4 (0,   2 ), the biological interpretation is not straightforward.For example,  2 cannot be interpreted as the proportion of SNPs in the small effect size category because whether the effect size is small or large depends on   2 which is annotation dependent.This may potentially affect the identifiability of the mixture membership if annotation-specific heritability enrichment parameters   2 are estimated jointly with the SNP effects   (instead, a two-step estimation approach is used in the MegaPRS paper, where   2 are estimated by BLD-LDAK model first and then treated as unknown quantities in LDAK-BayesR-SS).
To improve the model interpretability and potentially identifiability and robustness, we assume that   ~  1  0 +  2 (0,   2 /10000) +  3 (0,   2 /1000) +  4 (0,   2 /100) where the mixture distributions are independent of annotations, which represent zero, small, medium and large effect sizes that, in order, explains 0, 0.01%, 0.1% and 1% of the total genetic variance (  2 ).The mixture distribution membership probabilities,  1 ,  2 ,  3 ,  4 , are dependent of annotations such that the interpretation for the annotation effects are straightforward, that is, the higher value of the annotation would increase or decrease the probability of the SNP with an effect belong to a given effect size distribution.Since the model is clearly defined, there is no identifiability problem when estimating SNP effects and annotation effects altogether, which we believe is favourable compared to the two-stage estimation approach.

Comparison between SBayesRC and MegaPRS
We compared the performance of SBayesRC and MegaPRS (LDAK-BayesR-SS) model using simulations and real data.First, we simulated data based on the SBayesRC model and analysed using the SBayesRC or MegaPRS methods.Second, we simulated data based on the MegaPRS model (or equivalently, the S-LDSC model, when no extra LD weights are modelled) and analysed using the SBayesRC or MegaPRS methods.The details of simulation are described above.We found that SBayesRC gave a higher prediction accuracy than MegaPRS regardless of which data generative model was used in the simulation (Supplementary Fig. 11).Third, we ran MegaPRS in the 10-fold cross-validation analysis in the UKB of European ancestry, and found that, although outperformed LDpred2 and LDpredfunt, the prediction accuracy was lower than that from SBayesRC (Fig. 3a), consistent with the simulation results.Fourth, we applied MegaPRS in cross-biobank prediction analyses, including training in FinnGen and validation in UKB, and training in the data set from published meta-analysis and validation in Lifelines, across quantitative traits and diseases.
Our results showed that SBayesRC had a better predictive performance than MegaPRS, measured by the prediction accuracy relative to LDpred2 (we did not compute the prediction accuracy relative to the standard SBayesR because of the convergence issue in the standard SBayesR for some of the traits) (Fig. 3b).In particular, SBayesRC had a significant advantage over MegaPRS when using an external LD reference independent of the GWAS sample (Extended Data Fig. 7).Additionally, the advantage of SBayesRC over MegaPRS is observed with different training sample sizes for height and BMI (Fig. 3c).Moreover, our investigation in this study identified a significant interaction between SNP density and annotation information for improving polygenic prediction.We observed this beneficial interactive effect in both within-EUR and cross-ancestry prediction.We note that this is an important result that has not been concluded by the MegaPRS paper, which mainly focused on ~600K directly genotyped SNPs.Last but not least, although MegaPRS is capable of analysing 7M SNPs with annotations, it requires almost 3x larger memory than our method (Table 1).
The superiority of SBayesRC may result from three reasons.First, modelling annotations as a function of  may be a better way of incorporating annotation data, as explained above.
Second, in SBayesRC, all parameters are estimated coherently in one model, whereas MegaPRS takes heritability enrichment estimates and different sets of  values as input and do not consider the estimation variance for these parameters.Third, SBayesRC simultaneously fits all SNPs in the model, but MegaPRS only fits a subset of SNPs that are enriched in per-SNP heritability comparing to a random SNP in the genome.SNPs with depletion in heritability enrichment may still have a nonzero, albeit small, effect on the trait, and completely ignoring them may adversely affect prediction accuracy.Last, SBayesRC better accounted for the sampling variation in LD between the reference and GWAS samples through the low-rank approximation, which is likely to give higher prediction accuracy when using an external LD reference that is independent of the GWAS sample.

Other factors affecting accuracy of prediction leveraging functional annotations
Here we investigate other factors, besides SNP density, that affect accuracy of prediction leveraging functional annotations, including SNP-based heritability, GWAS sample size, properties of minor allele frequency (MAF) and LD, the number of annotations, and the strategy of analysis.
Stratifying traits based on the SNP-based heritability estimates from SBayesR found that traits with lower SNP-based heritability tended to benefit more from exploiting annotation data on 7M SNPs (regression slope = -15.7,P=0.031; Fig. 6a).When down sampling the UKB data for GWAS in height and BMI, the relative prediction accuracy using 7M SNPs and annotations was higher with a smaller GWAS sample size, while using more SNPs alone required a larger sample size to achieve a higher improvement in prediction accuracy (Fig. 6b).This is expected because including annotations in the model adds more information, whereas including more SNPs in the model alone increases the number of parameters to estimate, consuming more degrees of freedom.Given that most traits have a polygenic architecture, low SNP-based heritability or small sample size means low power in GWAS.
Thus, these results suggest that traits with limited GWAS power would benefit more from leveraging annotation data for prediction.
To investigate whether functional annotations provide additional information compared to LD and MAF properties, the relative prediction accuracy for height and BMI was evaluated by fitting various combinations of annotations.We found that the relative prediction accuracy increased with fitting 13 MAF related annotations and had no significant change when adding 4 LD bins, but further increased with additional functional annotations (Fig. 6c).Moreover, when conditional on MAF and LD bins, the improvement was larger with 96 functional annotations compared to 21 core annotations 13 .These results indicate that functional annotations are more informative than LD and MAF annotations, and using a comprehensive set of functional annotations is superior to using only a few key functional categories.
Lastly, we compared alternative strategies of analysing all common SNPs with annotations.
Simultaneous fitting of 7M SNPs in the model is computationally impractical in most MCMC-based methods.One strategy is to perform a stepwise analysis 10,14,15 , described as below.
Step 1, estimation of the impact of functional annotations on SNP effects, which is often quantified by the estimated annotation-specific enrichment in per-SNP heritability using S-LDSC 12 .
Step 2, prioritisation of all SNPs based on their functional annotations and the results from step 1.The final analysis uses a subset of the SNPs that rank from the top in Step 2. To compare our method to such a stepwise strategy, we selected the top 1M SNPs from 7M SNP set based on the results from S-LDSC 12 , and performed the prediction analysis in height and BMI.For SBayesRC, in which annotations were not assigned, the functionally prioritised 1M SNP set exhibited higher prediction accuracy than the 7M SNPs.However, for SBayesRC, in which annotations were assigned, the functionally prioritised 1M SNP set was slightly better than the same set of SNPs without re-estimation the annotation effects but was inferior to the 7M SNP set (Fig. 6d).These results suggest that the unified analysis using all 7M SNPs in the model is better than the stepwise analysis in refining the information from annotation data.

Summary data imputation
We implement the imputation method for summary data from ImpG 16 to avoid the heavy re- 19.Running settings for trans-ancestry prediction 1) Model with summary statistics from EUR only.
SBayesR family.We ran SBayesR 17 and SBayesRC using the summary statistics of UKB EUR, and then applied the estimated SNP effects directly to the genotypes of individual of SAS, EAS and AFR ancestries in the UKB.
PolyPred-S.We followed the steps in ref 20 , where we first used PolyFun + SuSiE for finemapping of 7M SNPs with functional annotations from BaseLineLF and LD data from the PolyPred program calculated from the UKB (a step called PolyFun-pred).Then, the mix SNP weights were optimised in 500 tuning samples of the target ancestry using predictors from both Polyfun-pred and SBayesR (trained in EUR).We obtained the PGS from the optimised SNP weights and the genotypes for each target ancestry (SAS, EAS, and AFR) in the UKB.
To ensure robustness and reduce sampling variation in the relative prediction accuracy, we excluded 10 traits with prediction accuracy (R 2 ) from SBayesR less than 0.05 in the crossvalidation in EUR and removed one trait (age at menarche) that failed to converge with PolyPred-S.As a result, we retained 17 traits for the trans-ancestry prediction analysis.
2) Model summary statistics from two population together.
PRS-CSx 21 , SBayesRC-multi, and MegaPRS-multi.We evaluated these methods based on their performance in predicting individuals of EAS and AFR ancestries (target ancestries), and validation samples.In addition, cross-validation provides a way to compute the standard error of the prediction accuracy for a trait, which is important when comparing differences in prediction performance between models/scenarios.For example, as shown in Fig. 5 of the main text, the improvement in prediction accuracy due to the use of annotations was significantly higher at 7M imputed SNPs than 1M HapMap3 SNPs for each trait, which can be seen from the standard error obtained from the cross-validation approach.
The reason we used 342K unrelated (genomic relationship cut-off at 0.05) individuals only is that SNPs can capture both direct and indirect genetic effects in the family as well as familyspecific environmental effects, which will, if validation samples include relatives of training samples, lead to a higher prediction accuracy than that would be expected in a random sample of the population.Prediction accuracy in an independent random sample remains the most widely accepted criterion to assess the PRS performance.Therefore, we wished to only include the unrelated individuals in the validation set and did not use data from other individuals beyond the 342K unrelated samples of European ancestry.

Lifelines Cohort Study
The

FinnGen Study
The FinnGen study is a large-scale genomics initiative that has analyzed over 500,000 Finnish biobank samples and correlated genetic variation with health data to understand disease mechanisms and predispositions.The project is a collaboration between research organisations and biobanks within Finland and international industry partners.We want to acknowledge the participants and investigators of the FinnGen study.

1 𝑁𝐗′𝐗
hand side is the GWAS marginal effect estimates b.Let  = be the LD correlation matrix.Then, we have =  + 1   ′ This is the summary-data-based model underlying many methods.Of note, in this model, the residuals have a variance-covariance structure proportional to the LD matrix, i.e., Lifelines Biobank initiative has been made possible by funding from the Dutch Ministry of Health, Welfare and Sport, the Dutch Ministry of Economic Affairs, the University Medical Center Groningen (UMCG the Netherlands), University of Groningen and the Northern Provinces of the Netherlands.The generation and management of GWAS genotype data for the Lifelines Cohort Study is supported by the UMCG Genetics Lifelines Initiative (UGLI).UGLI is partly supported by a Spinoza Grant from NWO, awarded to Cisca Wijmenga.The authors wish to acknowledge the services of the Lifelines Cohort Study, the contributing research centers delivering data to Lifelines, and all the study participants.UK Biobank This study has been conducted using UK Biobank resource under Application Number 12514.UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency.It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK.

Figure 3
Estimation of SNP-based heritability, polygenicity (the proportion of causal variants) and residual variance in SBayesRC without annotation using1M HapMap3 SNPs and different choices of LD reference for a simulated trait with heritability = 0.1 or 0.5.The dashed line in panel a and b indicates the true value in the simulation.The dashed line shows the simulated value.Each box plot shows the spread of data in 10 independent simulations: the line is the middle (median), the box covers the middle half (IQR), the whiskers extend to 1.5 times the IQR, and dots show outliers.
The per-SNP heritability enrichment (  ) is then calculated as For a quantitative annotation, the per-SNP heritability enrichment is calculated as the slope of the regression of   2 on the annotation value   E[   ] =   +     and   = 1 +   .Similarly, we compute   in every iteration of MCMC and estimate by the posterior mean after burn-in.
GWAS summary statistics, reference LD correlation matrix, functional annotation data 2 Scale the GWAS marginal effect estimate   =     * Calculate the posterior probabilities of SNP effect distribution memberships and sample   10 Sample SNP effect   from its full conditional distribution  ( Scale back the posterior mean SNP joint effects to per-allele scale by  ̂ * =    ̂ given   13 end 14 for k := 2 to number of mixture distribution components do 15 for j := 1 to number of SNPs that passed the bar for current component do 16 Sample latent variable   from a truncated normal distribution given   17 end 18 for c := 1 to number of annotations do 19 Sample annotation effect   from its full conditional distribution given 10d Zhang et al10, the covariance term can be written as a function of 184culation of the eigen decomposition for the LD matrix if some SNPs are missing from the LD panel.The imputation is based on the Z score, correlation information among missing SNPs and the typed SNPs.The Z score for the missing SNPs can be obtained from  =     −1  Where zi is the imputed Z score for missing SNPs in the GWAS summary data,   is the LD correlation matrix among the missing SNPs and typed SNPs,   is the LD correlation matrix among the typed SNPs.  is the allele frequency from reference genotype,   is the phenotypic standard derivation (square root of phenotypic variance).The phenotypic variance can be estimated by taking the median value of 2  (1 −   )[    2 + (  * ) 2 ] across SNPs, here   is the allele frequency in the GWAS sample (ref3,4).,LDpred-funct18,MegaPRS 8 , and SBayesRC. Bth 1M and 7M common SNP sets were used in these methods.For MegaPRS, the LDAK-BayesR-SS model was used for comparison, as recommended in their paper.For C+PT, a hold-out sample of 5,991 UKB individuals was used as the tunning data, same as in LDpred2.LDpred-funct used the validation sample as the tuning data, which may result in overfitting and inflate the prediction accuracy, so caution is advised.The functional annotation for LDpred-funct, MegaPRS, and We converted the   for missing SNPs to marginal effects at 0/1/2 scale (b * ) and standard error (  ) by * =     Where   is the per-SNP sample size (replaced by median per-SNP sample size of known SNPs instead), LDpred2, LDpred2 was run with the grid of models setting (96 models), and the hold-out sample of 5,991 UKB individuals with genotypes and phenotypes was used to choose the best model for prediction.C+PT